dgit.raspbian.org Git

add function for obtaining highest possible memory address

Add a function for obtaining the highest possible physical memory
address of the system. This value is influenced by:

- hypervisor configuration (CONFIG_BIGMEM)
- processor capability (max. addressable physical memory)
- memory map at boot time
- memory hotplug capability

Add this value to xen_sysctl_physinfo in order to enable dom0 to do a
proper sizing of grant frame limits of guests.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

x86emul: properly refuse LOCK on most 0FC7 insns

When adding support for RDRAND/RDSEED/RDPID I didn't remember to also
update this special early check. Make it (hopefully) future-proof by
also refusing VEX-encodings.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

python/libxc: extend the call to get/set cap for credit2

Commit 68817024 ("xen: credit2: allow to set and get utilization cap")
added a new parameter. Implement it for the python binding as well.

Coverity-ID: 1418532

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

xen/credit2: add missing unlock

Coverity-ID: 1418531

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Merge remote-tracking branch 'origin/staging' into staging

Introduce migration precopy policy

This Patch allows a migration precopy policy to be specified.

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
  mechanism of the precopy stage
- it can't take into account facts external to the immediate
  migration context, such external state transfer state, interactive
  user input, or the passage of wall-clock time.
- it does not permit the user to change their mind, over time, about
  what to do at the end of the precopy (they get an unconditional
  transition into the stop-and-copy phase of the migration)

To permit callers to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
  callbacks
- during the precopy phase of live migrations, consult this policy after
  each batch of pages transmitted and take the dictated action, which
  may be to a) abort the migration entirely, b) continue with the
  precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy, used when
  precopy_policy callback  is not provided.

Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

Tidy libxc xc_domain_save

Tidy up libxc's xc_domain_save, removing unused paramaters
max_iters and max_factor, making matching changes to libxl.

Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

Config.mk: update OVMF changeset

Update to allow to build OVMF with GCC 7.2.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl/xl: allow to get and set cap on Credit2.

Note that a cap is considered valid only if
it is within the [1, nr_vcpus]% interval.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

xen: credit2: improve distribution of budget (for domains with caps)

Instead of letting the vCPU that for first tries to get
some budget take it all (although temporarily), allow each
vCPU to only get a specific quota of the total budget.

This improves fairness, allows for more parallelism, and
prevents vCPUs from not being able to get any budget (e.g.,
because some other vCPU always comes before and gets it all)
for one or more period, and hence starve (and cause troubles
in guest kernels, such as livelocks, triggering of whatchdogs,
etc.).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

xen: credit2: allow to set and get utilization cap

As cap is already present in Credit1, as a parameter, all
the wiring is there already for it to be percolate down
to csched2_dom_cntl() too.

In this commit, we actually deal with it, and implement
setting, changing or disabling the cap of a domain.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>

x86/pv: Fix assertion failure in pv_emulate_privileged_op()

The ABI of {read,write}_msr() requires them to use x86_emul_hw_exception() if
they report an exception with the emulator core.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/levelling: Avoid NULL pointer dereference

Coverity points out that next is indeed NULL at times. Only try to read the
.cpuid_faulting field when we sure that next isn't NULL.

Fixes e7a370733bd "x86: replace arch_vcpu::cpuid_faulting with msr_vcpu_policy"

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/hvm/ioreq: use bool rather than bool_t

This patch changes use of bool_t to bool in the ioreq server code. It also
fixes an incorrect indentation in a continuation line.

This patch is purely cosmetic. No semantic or functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/hvm/ioreq: rename .*pfn and .*gmfn to .*gfn

Since ioreq servers are only relevant to HVM guests and all the names in
question unequivocally refer to guest frame numbers, name them all .*gfn
to avoid any confusion.

This patch is purely cosmetic. No semantic or functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86emul/test: generate non-pie executable for 64bit builds

PIE may (and commonly will) result in the binary being loaded above
the 4Gb boundary, which can't work with at least the VZEROUPPER compat
mode test.

Add -fno-PIE and -no-pie when appropriate so that gcc won't generate a
PIE executable.

Reported-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

python: Add binding for non-blocking xs_check_watch()

xs_check_watch() checks for watch notifications without blocking.
Together with the binding for xs_fileno(), this makes it possible
to write event-driven clients in Python.

Signed-off-by: Euan Harris <euan.harris@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

python: Extract registered watch search logic from xspy_read_watch()

When a watch fires, xspy_read_watch() checks whether the client has
registered interest in the path which changed and, if so, returns the
path and a client-supplied token. The binding for xs_check_watch()
needs to do the same, so this patch extracts the search code into a
separate function.

Signed-off-by: Euan Harris <euan.harris@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

python: Add binding for xs_fileno()

xs_fileno() returns a file descriptor which receives events when Xenstore
watches fire. Exposing this in the Python bindings is a prerequisite
for writing event-driven clients in Python.

Signed-off-by: Euan Harris <euan.harris@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

x86/msr: introduce guest_wrmsr()

The new function is responsible for handling WRMSR from both HVM and PV
guests. Currently it handles only 2 MSRs:

MSR_INTEL_PLATFORM_INFO
MSR_INTEL_MISC_FEATURES_ENABLES

It has a different behaviour compared to the old MSR handlers: if MSR
is being handled by guest_wrmsr() then WRMSR will either succeed (if
a guest is allowed to access it and provided a correct value based on
its MSR policy) or produce a GP fault. A guest will never see
a successful WRMSR of some MSR unknown to this function.

guest_wrmsr() unifies and replaces the handling code from
vmx_msr_write_intercept() and priv_op_write_msr().

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
This also fixes a bug on AMD hardware where a guest which tries to
enable CPUID faulting via a direct write to the MSR will observe it
appearing to succeed, but because Xen actually ignored the write.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/msr: introduce guest_rdmsr()

The new function is responsible for handling RDMSR from both HVM and PV
guests. Currently it handles only 2 MSRs:

MSR_INTEL_PLATFORM_INFO
MSR_INTEL_MISC_FEATURES_ENABLES

It has a different behaviour compared to the old MSR handlers: if MSR
is being handled by guest_rdmsr() then RDMSR will either succeed (if
a guest is allowed to access it based on its MSR policy) or produce
a GP fault. A guest will never see a H/W value of some MSR unknown to
this function.

guest_rdmsr() unifies and replaces the handling code from
vmx_msr_read_intercept() and priv_op_read_msr().

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
This (along with the prep work in init_domain_msr_policy()) also fixes
a bug where Dom0 could probe and find CPUID faulting, even though it
couldn't actually use it.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86: replace arch_vcpu::cpuid_faulting with msr_vcpu_policy

Since each vCPU now has struct msr_vcpu_policy, use cpuid_faulting bit
from there in current logic and remove arch_vcpu::cpuid_faulting.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/msr: introduce struct msr_vcpu_policy

The new structure contains information about guest's MSRs that are
unique to each vCPU. It starts with only 1 MSR:

MSR_INTEL_MISC_FEATURES_ENABLES

Which currently has only 1 usable bit: cpuid_faulting.

Add 2 global policy objects: hvm_max and pv_max that are inited during
boot up. Availability of MSR_INTEL_MISC_FEATURES_ENABLES depends on
availability of MSR_INTEL_PLATFORM_INFO.

Add init_vcpu_msr_policy() which sets initial MSR policy for every vCPU
during domain creation with a special case for Dom0.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/msr: introduce struct msr_domain_policy

The new structure contains information about guest's MSRs that are
shared between all domain's vCPUs. It starts with only 1 MSR:

MSR_INTEL_PLATFORM_INFO

Which currently has only 1 usable bit: cpuid_faulting.

Add 2 global policy objects: hvm_max and pv_max that are inited during
boot up. It's always possible to emulate CPUID faulting for HVM guests
while for PV guests the H/W support is required.

Add init_domain_msr_policy() which sets initial MSR policy during
domain creation with a special case for Dom0.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/mm: move declaration of new_guest_cr3 to local pv/mm.h

It is only used by PV. The code can only be moved together with other
PV mm code.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move PV l4 table setup code

Move the code to pv/mm.c. Export pv_arch_init_memory via global
pv/mm.h and init_guest_l4_table via local pv/mm.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: factor out pv_arch_init_memory

Move the split l4 setup code into the new function. It can then be
moved to pv/ in a later patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move and rename map_ldt_shadow_page

Add pv prefix to it. Move it to pv/mm.c. Fix call sites.

Take the chance to change v to curr and d to currd.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move compat descriptor table manipulation code

Move them alongside the non-compat variants.

Change u{32,64} to uint{32,64}_t and the type of ret from long to int
while moving.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: split out descriptor table manipulation code

Move the code to pv/descriptor-tables.c. Change u64 to uint64_t while
moving. Use currd in do_update_descriptor.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: add pv prefix to {set,destroy}_gdt

Unfortunately they can't stay local to PV code because domain.c still
needs them. Change their names and fix up call sites. The code will be
moved later together with other descriptor table manipulation code.

Also move the declarations to pv/mm.h and provide stubs.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: split out pv grant table code

Move the code to pv/grant_table.c. Nothing needs to be done with
regard to headers.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move map_guest_l1e to pv/mm.c

And export the function via pv/mm.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: remove the now unused inclusion of pv/emulate.h

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move ro page fault emulation code

Move the code to pv/ro-page-fault.c. Create a new header file
asm-x86/pv/mm.h and move the declaration of pv_ro_page_fault there.
Include the new header file in traps.c.

Fix some coding style issues. The prefixes (ptwr and mmio) are
retained because there are two sets of emulation code in the same file.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move {un,}adjust_guest_l*e to pv/mm.h

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move update_intpte to pv/mm.h

While at it, change the type of preserve_ad to bool. Also move
UPDATE_ENTRY there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: export get_page_from_mfn

It will be used later in multiple files.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move guest_get_eff_l1e to pv/mm.h

Make it static inline. It will be used by map_ldt_shadow_page and ro
page fault emulation code later.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

libxl: remove list callback from device framework

As we have generic functions to get device list
(libxl__device_list) no need to have callback in
the framework. To resolve issue when XS entry
doesn't match device name, device type is
extended with field "entry" which keeps XS entry.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

x86/PSR: fix cmdline option handling

Commit 0ade5e causes a bug that the psr features presented in cmdline
cannot be correctly enumerated.
1. If there is only 'psr=', then CMT is enumerated which is not right.
2. If cmdline is 'psr=cmt,cat,cdp,mba', only the last feature is enumerated.

This patch fixes the issues.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

custom parameter handling fixes

The recent changes to their handling introduced a few false warnings,
due to checks looking at the wrong string slot. While going through all
those commits and looking for patterns similar to the "dom0_mem=" I've
noticed this with, I also realized that there were other issues with
"dom0_nodes=" and "rmrr=", partly pre-existing, but partly also due to
those recent changes not having gone far enough.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

libxl: provide typedefs for device framework functions

Use the new typedefs to avoid copy-n-paste everywhere.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>

x86/mem_access: fix mis-indented line

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

perfc: fix build after commit fc32575968 when CONFIG_PERF_COUNTERS=y

The commit fc32575968 "public/sysctl: drop unnecessary typedefs and
handles" went a bit too far by replacing all xen_systcl_*_t type to
struct xen_sysctl_*.

However, xen_sysctl_perfc_val_t was a typedef on uint32_t and therefore
is not associated to a structure.

Use xen_sysctl_perfc_val_t to fix the build when CONFIG_PERF_COUNTERS=y

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

grant-table: simplify get_paged_frame

The implementation of get_paged_frame is currently different whether the
architecture support sharing memory or paging memory. Both
version are extremely similar so it is possible to consolidate in a
single implementation.

The main difference is the x86 version will allow grant on foreign page
when using HVM/PVH whilst Arm does not. At the moment, on x86 foreign pages
are only allowed for PVH Dom0. It seems that foreign pages should never
be granted so deny them

The check for shared/paged memory are now gated with the respective ifdef.
Potentially, dummy p2m_is_shared/p2m_is_paging could be implemented for
Arm.

Lastly remove pointless parenthesis in the code modified.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

xen: credit2: fix spinlock irq-safety violation

In commit ad4b3e1e9df34 ("xen: credit2: implement
utilization cap") xfree() was being called (for
deallocating the budget replenishment timer, during
domain destruction) inside an IRQ disabled critical
section.

That must not happen, as it uses the mem-pool's lock,
which needs to be taken with IRQ enabled. And, in fact,
we crash (in debug builds):

(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at spinlock.c:47
(XEN) ****************************************

Let's, therefore, kill and deallocate the timer outside of
the critical sections, when IRQs are enabled.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

xen/arm: page: Prefix memory types with MT_

This will avoid confusion in the code when using them.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: page: Use directly BUFFERABLE and drop DEV_WC

DEV_WC is only used for PAGE_HYPERVISOR_WC and does not bring much
improvement.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: page: Remove unused attributes DEV_NONSHARED and DEV_CACHED

They were imported from non-LPAE Linux, but Xen is LPAE only. It is time
to do some clean-up in the memory attribute and keep only what make
sense for Xen. Follow-up patch will do more clean-up.

Also, update the comment saying our attribute matches Linux.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

public/sysctl: drop unnecessary typedefs and handles

By virtue of the struct xen_sysctl container structure, most of them
are really just cluttering the name space.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>

public/domctl: drop unnecessary typedefs and handles

By virtue of the struct xen_domctl container structure, most of them
are really just cluttering the name space.

While doing so,
- convert an enum typed (pt_irq_type_t) structure field to a fixed
width type,
- make x86's paging_domctl() and descendants take a properly typed
handle,
- add const in a few places.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>

x86/hvm: break out __hvm_copy()'s translation logic

It will be reused by later changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/hvm: rename enum hvm_copy_result to hvm_translation_result

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

add new domctl hypercall to set grant table resource limits

Add a domctl hypercall to set the domain's resource limits regarding
grant tables.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

clean up grant_table.h

Many definitions can be moved from xen/grant_table.h to
common/grant_table.c now, as they are no longer used in other sources.

Rearrange the elements of struct grant_table to minimize holes due to
alignment of elements.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

move XENMAPSPACE_grant_table code into grant_table.c

The x86 and arm versions of XENMAPSPACE_grant_table handling are nearly
identical. Move the code into a function in grant_table.c and add an
architecture dependant hook to handle the differences.

Switch to mfn_t in order to be more type safe.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

x86/vvmx: add hvm_intsrc_vector support to nvmx_intr_intercept()

Under the following circumstances:

1. L1 doesn't enable PAUSE exiting or PAUSE-loop exiting controls
2. L2 executes PAUSE in a loop with RFLAGS.IE == 0

L1's PV IPI through event channel will never reach the target L1's vCPU
which runs L2 because nvmx_intr_intercept() doesn't know about
hvm_intsrc_vector. This leads to infinite L2 loop without nested
vmexits and can cause L1 to hang.

The issue is easily reproduced with Qemu/KVM on CentOS-7-1611 as L1
and an L2 guest with SMP.

Fix nvmx_intr_intercept() by injecting hvm_intsrc_vector irq into L1
which will cause nested vmexit.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

xen/arm: Replace ioremap_attr(PAGE_HYPERVISOR_NOCACHE) call by ioremap_nocache

ioremap_cache is a wrapper of ioremap_attr(...).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: traps: Improve logging for data/prefetch abort fault

Walk the hypervisor page table for data/prefetch abort fault to help
diagnostics error in the page tables.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: traps: Introduce a helper to read the hypersivor fault register

While ARM32 has 2 distinct registers for the hypervisor fault register
(one for prefetch abort, the other for data abort), AArch64 has only
one.

Currently, the logic is open-code but a follow-up patch will require to
read it too. So move the logic in a separate helper and use it instead
of open-coding it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Introduce hsr_xabt to gather common bits between hsr_{d,i}abt

This will allow to consolidate some part of the data abort and prefetch
abort handling in a single function later on.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Add FnV field in hsr_*abt

FnV (FAR not Valid) bit was introduced by ARMv8 in both AArch32 and
AArch64 (See D7-2275, D7-2277, G6-4958, G6-4962 in ARM DDI 0487B.a).

Note the new revision of ARMv8 defined more bits in HSR. They haven't
been added at the moment because we have no use of them in Xen.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: arm32: Don't define FAR_EL1

Aliasing FAR_EL1 to IFAR is wrong because on ARMv8 FAR_EL1[31:0] is
architecturally mapped to DFAR and FAR_EL1[63:32] to IFAR.

As FAR_EL1 is not currently used in ARM32 code, remove it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: traps: Don't define FAR_EL2 for ARM32

Aliasing FAR_EL2 to HIFAR makes the code confusing because on ARMv8
FAR_EL2[31:0] is architecturally mapped to HDFAR and FAR_EL2[63:32] to
HIFAR. See D7.2.30 in ARM DDI 0487B.a. Open-code the alias instead.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: hsr_iabt: Document RES0 field

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: mm: Redefine mfn_to_virt to use typesafe

This add a bit more safety in the memory subsystem code.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

MAINTAINERS: Add public/arch-arm.h under the ARM subsystem

The header public/arch-arm.h contains mostly ARM specific code. Avoid CC
the "THE REST" maintainers on it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

correct gnttab_get_status_frames()

In gnttab_get_status_frames() all accesses to nr_status_frames should
be done with the grant table lock held.

While at it correct coding style: labels should be indented by one
space.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

mm: scrub pages returned back to heap if MEMF_no_scrub is set

Set free_heap_pages()'s need_scrub to true if alloc_domheap_pages()
returns pages back to heap as result of assign_pages() error when those
pages were requested with MEMF_no_scrub flag.

We need to do this because there is a possibility that
alloc_heap_pages() might clear buddy's PGC_need_scrubs flag without
actually clearing the page.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

hvmloader: Use MB(x) and GB(x) macros

instead of hardcoding values. We also drop the uint64_t
cast as the macro uses ULL to produce the proper width
types.

Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xl: avoid leaking memory in vdispl parser

Coverity-ID: 1418095

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: use libxl__read_xenstore_mandatory in vtpm function

libxl__read_xenstore can return NULL. Use the _mandatory variant to
return early when the read fails.

Coverity-ID: 1418098

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: use libxl__read_xenstore_mandatory in vdispl function

Coverity-ID: 1418097

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86emul: re-order checks in test harness

On older systems printing the "n/a" messages (resulting from the
compiler not being new enough to deal with some of the test code) isn't
very useful: If both CPU and compiler are too old for a certain test,
we can as well omit those messages, as those tests wouldn't be run even
if the compiler did produce code. (This has become obvious with the
3DNow! tests, which I had to run on an older system still supporting
those insns, and that system naturally also had an older compiler.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

mm: use typesafe MFN for alloc_boot_pages return

At the moment, most of the callers will have to use mfn_x. However
follow-up patches will remove some of them by propagating the typesafe a
bit further.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

x86/cpuidle: add new CPU families

Bring code up-to-date with SDM version 063.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

VMX: add new CPU families to LBR handling

Bring code up-to-date with SDM version 063, including the LBR format
enumeration.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

VMX: convert CPU family numbers to hex

This makes it easier to match them against SDM updates. Also update a
few comments with names as per SDM version 063.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

xen/arm: p2m: Check for p2m->domain to be initialized before releasing resources

Since p2m_teardown() can be called when p2m_init() haven't executed yet
we might deal with unitialized list "p2m->pages" which leads to crash.
To avoid this use back pointer to domain as end-of-initialization indicator.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: vgic: Check for vgic handler to be initialized before dereferencing it

Since domain_vgic_free() can be called when the vgic_ops haven't been
initialised yet, always check that d->arch.vgic.handler is not a null.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

arm: use plain bool in various headers

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

arm/vtimer.c: switch to plain bool

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

arm/smpboot.c: switch to plain bool

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

arm/decode.c: switch to plain bool

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

arm/bootfdt.c: switch to plain bool

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

arm/cpu{errata,feature}.[ch]: switch to plain bool

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

arm/alternative.c: switch to plain bool

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

arm/platform: switch to plain bool

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

arm/irq: switch to plain bool

Also removed a redundant pair of brackets.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

arm/domain_build: switch to plain bool

Also fixed a coding style issue.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

arm/{v,}gic: switch to plain bool

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

arm/acpi: switch to plain bool

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

xen/arm: Limit the scope of cpregs.h

Currently, cpregs.h is indirectly included every files of the hypervisor even
for arm64. However, the only use for arm64 is when emulating co-processors.

For arm32, all users of processor.h expect cpregs.h to be included in
order to access co-processors. So move the inclusion in
asm-arm/arm32/processor.h.

cpregs.h will also be directly included in the co-processors emulation
to accommodate arm64.

This is drastically reducing the exposure of cpregs.h to any source file
on arm64.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Move sysregs.h in arm64 sub-directory

sysregs.h contains only code protected by #ifdef CONFIG_ARM_64. Move it
in arm64 sub-directory to reflect that and remove the #ifdef.

At the same time, fixup the guards.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Move co-processor emulation outside of traps.c

The co-processor emulation is quite big and pretty much standalone. Move
it in a separate file to shrink down the size of traps.c.

At the same time remove unused cpregs.h.

No functional change.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Move sysreg emulation outside of traps.c

The sysreg emulation is 64-bit specific and surrounded by #ifdef. Move
them in a separate file arm/arm64/vsysreg.c to shrink down a bit traps.c

No functional change.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: traps: Export a bunch of helpers to handle emulation

A follow-up patch will move some parts of traps.c in separate files.
The will require to use helpers that are currently statically defined.
Export the following helpers:
    - inject_undef64_exception
    - inject_undef_exception
    - check_conditional_instr
    - advance_pc
    - handle_raz_wi
    - handle_wo_wi
    - handle_ro_raz

Note that asm-arm/arm32/traps.h is empty but it is to keep parity with
the arm64 counterpart.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen: credit2: implement utilization cap

This commit implements the Xen part of the cap mechanism for
Credit2.

A cap is how much, in terms of % of physical CPU time, a domain
can execute at most.

For instance, a domain that must not use more than 1/4 of
one physical CPU, must have a cap of 25%; one that must not
use more than 1+1/2 of physical CPU time, must be given a cap
of 150%.

Caps are per domain, so it is all a domain's vCPUs, cumulatively,
that will be forced to execute no more than the decided amount.

This is implemented by giving each domain a 'budget', and
using a (per-domain again) periodic timer. Values of budget
and 'period' are chosen so that budget/period is equal to the
cap itself.

Budget is burned by the domain's vCPUs, in a similar way to
how credits are.

When a domain runs out of budget, its vCPUs can't run any
longer. They can gain, when the budget is replenishment by
the timer, which event happens once every period.

Blocking the vCPUs because of lack of budget happens by
means of a new (_VPF_parked) pause flag, so that, e.g.,
vcpu_runnable() still works. This is similar to what is
done in sched_rtds.c, as opposed to what happens in
sched_credit.c, where vcpu_pause() and vcpu_unpause()
(which means, among other things, more overhead).

Note that, while adding new fields to csched2_vcpu and
csched2_dom, currently existing members are being moved
around, to achieve best placement inside cache lines.

Note also that xenalyze and tools/xentrace/format are being
updated too.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

x86/mm: initialize ol1e in create_grant_pv_mapping() for older compilers

On gcc 4.4.4:

mm.c: In function \91create_grant_pv_mapping\92:
mm.c:3839: error: \91ol1e.l1\92 may be used uninitialized in this function

While ol1e would not be used uninitialized (because rc needs to be properly
set) we have to accommodate these older compliers.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>